[WIP] Support qwen3-omni by CUHKSZzxy · Pull Request #4411 · InternLM/lmdeploy

CUHKSZzxy · 2026-03-13T10:50:34Z

Summary

Support Qwen3-Omni thinker inference in the PyTorch backend.

This PR adds Qwen3-Omni model registration, HF processor integration, and multimodal preprocessing for image, video, audio, and mixed image/audio/video inputs. Audio support is currently limited to Qwen3-Omni.

Changes

Add Qwen3-Omni PyTorch thinker model support.
Add Qwen3-Omni VL preprocessor using the shared get_input_prompt -> preprocess path.
Support image-only, video-only, audio-only, and mixed image/audio/video prompts.
Keep Qwen3-Omni video expansion as whole-video spans, distinct from Qwen3VL per-frame timestamp handling.
Add audio media parsing for OpenAI-style multimodal messages.
Add multimodal input docs and examples, including Qwen3-Omni audio usage.

Notes

Talker/audio-generation support is not included.
Audio input support is scoped to Qwen3-Omni.
Advanced use_audio_in_video=True interleaving is not enabled in this patch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Support qwen3-omni#4411

[WIP] Support qwen3-omni#4411
CUHKSZzxy wants to merge 22 commits intoInternLM:mainfrom
CUHKSZzxy:support-qwen3-omni

CUHKSZzxy commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CUHKSZzxy commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Notes

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CUHKSZzxy commented Mar 13, 2026 •

edited

Loading